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Subject: Unattenaeo Operation of Multics 



Several Multics installations nave been attempting to run 
their systems without an operator present. Although tnis is not 
an advertised feature of Multics, these sites have done pretty 
well by leaving tne system running when the operators go off 
shift? with the understanding that the system will oe availaole 
without backup or tape mounting facilities* and that if tne 
system crashes it will stay down until an operator comes in. 
This memoranaum describes changes to the supervisor anc suoport 
oroceoures which woul a make unattended operation easier and more 
reliable. With some very simple changes* unattended moce becomes 
usable almost immeaiately? Jater changes which make additional 
improvements in unattended mode are also describea. 



AUTOMATIC RESTART AFTER A SYSTEM CRASH 

When Multics crashes, the operator usually brings the system 
back up again. Almost all of tne steos which the operator taKes 
can be automated. These steps are: 

1. Determine that the system has crashed. 

2. Invoke the "CRASH" runcom. 

3. Invoke the LD355 ana BOOT commands. 

Reply "startup" to the Initializer process. 

5. Start tne Backup and 10 daemons. 

Existing facilities a~e sufficient Tor automating many of these 
steps. Only minor changes need oe made to fix the others. 



Qj£±£QSliaiQa Whe_a lh£ iy.slfeJH Has. ££&s_D.&d. 

Crashes will be detected when the system returns to BOS from 
Multics operation. There are some cases of system crash such as 
power failure* idle loop* or hung Initializer process which do 
not return to 80S: these cases are not hanalea oy the new 



Multics Project internal working documentation. Not to be 
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scheme, but ere fortunately fairly rare. There are also cases in 
which the system returns to QCS but has not crashed, for instance 
after a normal shut Gown, or in response to the system_contro l_ 
"bos" command; modifications will oe made to the supervisor and 
to BOS to cetect thess cases. 

We will set up one wora of storage in tne BOS toehola for 
intercommunication between Multics ana BOS. This wora will oe 
used as a set of 36 switches which will oe set either by a BOS 
command or by a Multics privilegeo call. One of these switches 
wiH be reserves to mean "automatic reooot moae is on" ? this 
switch will be turnea ON by a Multics command, and will be turned 
OFF by Multics command, BOS command, or by automatic crash-loop 
detecting coce in Multics startup. Another switch will mean "the 
system crashed": this switch will be set ON at boot time and 
reset by normal shutdown ana calls to pmu ts bos_ar,c_r e turn • 

When the systen returns to BOS, any rur.com which container 
the BOOT command will continue on to its next statement. This 
statement will oe a test of the "crashed" switch in The stanaara 
rur.com, to discover whether the system crashed or shut down 
norm a! I y • 

It is already possible to distinguish between the case of a 
Return-To-BOS causea Dy the Multics software ana one causea oy an 
XEO i+C G 0 from the processor panel. Since the latter case implies 
that someone is present in the computer room, it shoula not oe 
assumeo that this action is a "crash." Similarly, the setting of 
the "crashed" switch will allow 90S to distinguish between calls 
to omutsoos and pmu t $o os_and_re turn ; the secona call is made 
only in response to manual intervention and so shoulj not be 
detected as a crash. 

Some changes coul a be maae in the scheduler to aetect idle 
loops, ana it shoulc oe possiole to set up a oetector for the 
cases in which the Initializer process is hung: put these are 
refinements which can oe proposed separately. A special command 
to cause the system to crash should also oe constructed which can 
be used oy trustworthy installation personnel in cases where they 
notice that the system shoula oe restarted. (Special access 
orivilege will be provided on the gate wnicn performs this 
function.) 



Imta&iaa ins £8A£ri Ruac_o_m 

When the ooerato~ discovers 3 system crash, he usually types 
CRASH immediately, unless it is obvious that a hardware failure 
will prevent recovery from wording. The usual recovery 
oroc'.J'ures consist of the following commands: 

FOUMP (unless inhibited) 

FD355 
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BLAST CRASH 
ESD 

SALV (if necessary) 

LD355 

BOOT 

The actual runcoms wil 1 of course be much more complicates in 
the style of MOSN-2**!. Various switches will be adasa to allow 
the operator to reauest that the system pause before booting and 
salvaging» after crashing* and so on. Similarly* mooes to 
suppress crash dumps altogether or to take printer anc/or Tape 
dumps in adcition to the F DUMP will be definec. 

In order to prevent the system from cycling in a tight looo 
of boot - crash immediately - recover - boot* several small 
modifications will be made to the scheme so far out I inec, so that 
the default after an automatic reboot is to turn off automatic 
mooe* until the answering service determines that the system is 
truly up* ana is not in a crash loop. The acvantage to this 
approach is that the full Hultics environment is available for 
programs which try to detect the loop. Automatic rebooting will 
be discontinued after N crashes after automatic mode was entered. 
It will also be turned off if there are more than M crashes in < 
minutesi ana aamin commands will be added to the Initia I izer to 
allow the setting of these parameters. 



The restarting of the system will be done as a part of the 
runco^ loop *nich the system will be trapoeo in, unless a "manual 
intervention" switch is set or automatic noae has oeen tj-nej 
off. Tape-positioning operations may nave to be inse-tea in tne 
runcom to position tne unified boot load tape (see MT6-i3j). Tne 
drive on which the unified tape is mounted must be unavailable 
for use by the supervisor tape- ass i gment coce* so that the tape 
can remain permanent ly mounteo. 



E££iY_iii2 is. Initialize. C 

When the Initializer has completed hardcore initialization* 
it enters the ring-1 environment and waits for an input command 
from the master console which tells it whether to do a cola boot, 
enter BOS* or start up the system. The system_s tartup_ program 
must be modified to check the switches in the toehola ana to 
manufacture a "startup" command if "automatic rebooT" moat is ON. 
The ring-1 environment will then proceed to call 
syst em_cont rol _$startup for answering service initialization. 
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Slaciiiaa ins. aa^maas 

Most installations have their sy3tem_start_up.ee arranged so 
that the oaemons do not automatical ly start running, out only 
come to command level* so that operators may modify parameters or 
choose not to run the daemons at all. 3ut when no operator is 
present, the daemons will then hang. A simple solution to this 
problem is to have ruining unattended imply running without any 
backup or 10 daemons. Alternatively, the standard action taken 
by system_start_up.ee could oe to include the startup of the 
daemons. Neither of these choices is attractive. The best 
solution is to make an active function available which will 
return "true" when the system is in "automatic reboot" mode, so 
that the system_start_up.ee can take a default action when the 
system is unattended, and otnerwise wait for the operator. 



FACILITIES AVAILABLE IN UN AT TENDED MODE 



The other duties of the system operator relate to the 
tending of the printer and the mounting of tapes. If we provide 
a oer-system switch which tells wnether tne system is unattended, 
it will be fairly easy to cause the system software to reject all 
requests which cannot oe handled automatically, and to suppress 
messages which have no function except to prompt the operator. 
For example, user tape-mount requests which cannot oe satisfied 
without human intervention should oe rejectee by the 
tape-management software. Similarly, there seems to be little 
harm in attempting to run the 10 daemon, but when the printer 
paper Jams or runs out, the message describing this situation 
neec be printed only once? the device driver should then wait 
quietly for someone to fix its problem. 

Once "unattenaed mode" is set, user programs should be aole 
to determine that the~e is no operator present, and when ont will 
be availaole, if this is known. A flag in whotab will oe used to 
indicate the system mode, and a few more words to show the time 
this moot will change. The system mode and schedule data will oe 
set Dy issuing an Initializer process command. (Perhaps adding 
and deleting an operator shoulu be thought of as dynamic 
reconfiguration.) When the system changes from unattended to 
attended, the flag in whotab may oe used, at installation option, 
for- some other information such as the initials of the cnief 
operator on duty. Tne subroutine system_info_ ano the 
command/active function "system" will oe updatec to return these 
new items. 



BACKUP 

The major problem with running the system in unattended mooe 
is that the incremental backup daemon will eventually fill its 
output backup cumo tape. When it uoes, it wiSJ reauest another 
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tape number from the operator, The reply to this question could 
have been issued in advance, by a call to "reoly" in 
systern_start_up.ee but the daemon will then take this string and 
call the tape software to have a tape mounted. Our suggestion so 
far has been to refuse to mount the tape. 

It will not oe difficult to change to the t ao~- ass i gnmen t 
package to make it work differently for the Backup aaemon if the 
system is unattended. When the operator is placing the system 
into unattended mode, he will load all available arives with 
blank backup tapes* ana instruct the t ape- ass i gmen t software 
assign these tapes one by one to the daemon as needed. This 
requires that the tape mount software perform a slightly 
different action when assigning a tape, that the tape DIM not 
dump a tape at mount time if it is already mounted* and also that 
the tapes which are not at load point when Multics is booted 
should be (file marked ana) unloaded. 



CONFIGURATION 



The bug which orevents the system from coming up with more 
than one CPU should be fixed, so that 3 large system which 
crashes and successfully reboots is not limited to one CPU. 
There will be cases, however, when the supervisor ceconf igures 
some system resources due to errors. (This is done with bulk 
store records currently, and will be done with memory in the 
future. With the new Storage system, disk drives may also be 
treated this way.) If a resource has been dropped aue to error, 
an automatic restart should know not to try to use the device 
again. 



GENERALIZATIONS OF THIS SCHEME 

Unattended moae shoula probably be implemented as a set of 
switches, one for automatic rebooting, another for tape-drive 
handling, and so forth. Some installations may wish to provide 
weekenc service with no operator in the computer room but with an 
attendee operator console distant from tne macnine room (via the 
message cooroinator ) • There are also devices on the market wnich 
automatically find a tape ana loau it onto a tape drive in 
response to a software order; so that an unattenaea system might 
oe able to answer most tape mount requests. Another possibility 
which might be usee by some sites is that one or more 10 devices 
(card readers, tape drives, printers) might be located in an area 
accessible to trustee users, who would be able to issue commands 
to attach ana use these uevices even when no operator is 
attenaing the main computer. 



The unattended moaes should probably oe implemented so that, 
when a problem occurs, it will be convenient for someone to come 
in* fix the problem, and let the system continue unattended. For 
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example, if a printer runs out of paper, and a system programmer 
happens to pass the machine* he should be aole to insert a fresh 
box of paper ana allow the Daemon to continue printing, without 
having to issue any Initializer commands- or having to affect 
unatteded mode with respect to taoes, 



